Local case-control sampling: Efficient subsampling in imbalanced data sets

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subsampling and reconstruction of bandlimited images with universal sampling sets

We investigate the subsampling and reconstruction of bandlimited images at universal sampling sets. Theoretically such sampling sets should guarantee the reconstruction of an image that is k-sparse in the DFT domain with only k samples. We find that, due to matrix conditioning issues, more than k samples are generally required, and we compare the reconstruction results to those from the sparse ...

متن کامل

A Case Study for Learning from Imbalanced Data Sets

We present our experience in applying a rule induction technique to an extremely imbalanced pharmaceutical data set. We focus on using a variety of performance measures to evaluate a number of rule quality measures. We also investigate whether simply changing the distribution skew in the training data can improve predictive performance. Finally, we propose a method for adjusting the learning al...

متن کامل

Neighbourhood sampling in bagging for imbalanced data

Various approaches to extend bagging ensembles for class imbalanced data are considered. First, we review known extensions and compare them in a comprehensive experimental study. The results show that integrating bagging with under-sampling is more powerful than over-sampling. They also allow to distinguish Roughly Balanced Bagging as the most accurate extension. Then, we point out that complex...

متن کامل

Title: A PRIORI SYNTHETIC SAMPLING FOR INCREASING CLASSIFICATION SENSITIVITY IN IMBALANCED DATA SETS

Class imbalance data usually suffers from data intrinsic properties beyond that of imbalance alone. The problem is intensified with larger levels of imbalance most commonly found in observational studies. Extreme cases of class imbalance are commonly found in many domains including fraud detection, mammography of cancer and post term births. These rare events are usually the most costly or have...

متن کامل

Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning

In recent years, mining with imbalanced data sets receives more and more attentions in both theoretical and practical aspects. This paper introduces the importance of imbalanced data sets and their broad application domains in data mining, and then summarizes the evaluation metrics and the existing methods to evaluate and solve the imbalance problem. Synthetic minority oversampling technique (S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Annals of Statistics

سال: 2014

ISSN: 0090-5364

DOI: 10.1214/14-aos1220